Bilingual corpus for AVASR using multiple sensors and depth information
نویسندگان
چکیده
In this paper we present the Bilingual Audio-Visual Corpus with Depth information (BAVCD). The database contains utterances of connected digits, spoken by 15 subjects in English and 6 subjects in Greek, and collected employing multiple audio-visual sensors. Among them, of particular interest is the use of the Microsoft Kinect device, which is able to capture facial depth images using the structured light technique in addition to the traditional RGB video. The database allows conducting research on multiple aspects of small-vocabulary audio-visual automatic speech recognition, such as the use of visual depth information for speechreading, fusion of multiple video and audio streams, and language dependencies of the task. Preliminary results on the corpus are also presented.
منابع مشابه
Title of Thesis: Textual Representations for Corpus-based Bilingual Retrieval Title of Thesis: Textual Representations for Corpus-based Bilingual Retrieval Textual Representations for Corpus-based Bilingual Retrieval
Title of Thesis: Textual Representations for Corpus-Based Bilingual Retrieval Paul McNamee, Doctor of Philosophy, 2008 Thesis directed by: Charles K. Nicholas, Professor Department of Computer Science and Electrical Engineering The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. However, word-based representations have diffic...
متن کاملPhrase Alignment Based on Combination of Multiple Strategies
Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. There is phrase boundary information in parsing trees of sentences. Linguistics knowledge in translation lexicon and semantic lexicon, and statistics results from bilingual corpus can be used to align Chinese wo...
متن کاملA robust audio-visual speech recognition using audio-visual voice activity detection
This paper proposes a novel speech recognition method combining Audio-Visual Voice Activity Detection (AVVAD) and Audio-Visual Automatic Speech Recognition (AVASR). AVASR has been developed to enhance the robustness of ASR in noisy environments, using visual information in addition to acoustic features. Similarly, AVVAD increases the precision of VAD in noisy conditions, which detects presence ...
متن کاملCombining Machine Readable Lexical Resources and Bilingual Corpora for Broad Word Sense Disambiguation
This paper describes a new approach to word sense disambiguation (WSD) based on automatically acquired "word sense division. The semantically related sense entries in a bilingual dictionary are arranged in clusters using a heuristic labeling algorithm to provide a more complete and appropriate sense division for WSD. Multiple translations of senses serve as outside information for automatic tag...
متن کاملTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are...
متن کامل